The Haar Wavelet Transform of a Dendrogram: 
Additional Notes 

Fionn Murtagh* 
February 1, 2008 

Abstract 

We consider the wavelet transform of a finite, rooted, node-ranked, 
p-way tree, focusing on the case of binary {p = 2) trees. We study 
a Haar wavelet transform on this tree. Wavelet transforms allow for 
multiresolution analysis through translation and dilation of a wavelet 
function. We explore how this works in our tree context. 

Keywords: Haar wavelet transform; binary tree; ultrametric topology; p- 
adic numbers; hierarchical clustering; data mining; local field; abelian group. 

1 Introduction 

In a companion paper, which we will refer to as Paper I ( "The Haar wavelet 
transform of a dendrogram"), a new transform is applied to a hierarchical 
clustering. Various examples are given of uses of this transform, prior to 
applying the inverse transform. 

In this paper, we look at linkages with other ways of understanding the 
wavelet transform, with the classical demarche described in Appendix 1. 
Our aim is to understand the wavelet transform when applied to hierarchi- 
cal clustering dendrograms (where notation and expression as ultrametric 
topology are summarized in Appendix 2). 

After all, both the wavelet transform and hiearchical clustering aspire 
to multiresolution or multiscale analysis. The natural question is then: how 
do they differ and are there different aspects that they bring to the data 
analysis task? 
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A good deal of recent work on wavelet transform has been through group 
theoretic approaches. 

Foote et al. (2000a) point to how group theoretic understanding can 
lead to a "wealth of new analysis filters" (in the context of multiresolu- 
tion signal and image analysis). The same point is made by the SMART 
project (SMART, 2005), including the change to have automatic generation 
of new transform algorithms. Believing that algorithms should be developed 
if and only if the there is a verifiable user need for them, we would instead 
point to another reason why group theoretic understanding is crucial to data 
analysis. A great deal of observed reality can be understood by way of ob- 
served symmetries, and groups summarized and encapsulate the properties 
of these symmetries. For time evolving phenomena, therefore, or spatial co- 
ordinate referenced phenomena, it may be possible to replace analysis that is 
time-referenced or referenced to particular coordinate systems with a more 
general, more generic, symmetry analysis. This is the vision opened up by 
the study of group actions on a set of objects. 

Parenthically, one fascinating way as to how this works can be seen in 
Cendra and Marsdcn (2003). The authors develop (i) an analytic theory of 
dynamics as functions of spatial and temporal coordinates; and (ii) group 
theoretic interpretation of, in parts of the study, return or phase maps. 

Our approach can be stated as follows. Let U be an ultrametric space, 
associated with an m-dimensional embedding, M™. Wc note that an ultra- 
metric space is necessarily of dimcncnsionality; and that minimal dimen- 
sionality real embedding of an ultrametric has been studied by Lemin and 
others (Lemin, 2001; Bartal et al., 2004). 

The partial order of (clopen) set inclusion is denoted by the binary tree 
or hierarchy, H. Consider the group action comprising rotations or cyclic 
permutations (these are equivalent) of subnodes of any node in H, and we 
will denote this group as Gh- Then we study the wavelet transform of L^(U) 
resulting from the actions of group Gh- 

Having already discussed the new wavelet transform in Paper I, we can 
give one result relating to it in the context of the group of equivalent repre- 
sentations of H as follows. 

Theorem: For all 2"~^ equivalent representations of H (here: unlabeled 
graph isomorphisms), the dendrogram Haar wavelet transform is unique. 

The proof follows from the definition of the wavelet coefficents at each 
level, u; whereas the equivalent representations of H are intra level. 

It follows from this theorem that we have a unique matrix representation 
of a dendrogram. 
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2 Previous Work on Wavelet Transforms of Data 
Tables 



In this section we will review recent work using wavelet transforms on data 
tables, and show how our work represents a radically new approach to tack- 
ling similar objectives. 

Approximate query processing arises when data must be kept confiden- 
tial so that only aggregate or macro-level data can be divulged. Approxi- 
mate query processing also provides a solution to access of information from 
massive data tables. 

One approach to approximate database querying through aggregates is 
sampling. However a join operation applied to two uniform random samples 
results in a non-uniform result, which furthermore is sparse (Chakrabarti, 
Garofalakis, Rastogi and Shim, 2001). A second approach is to keep his- 
tograms on the coordinates. For a multidimensional feature space, one is 
faced with a "curse of dimensionality" as the dimensionality grows. A third 
approach is wavelet-based, and is of interest to us in this article. 

A form of progressive access to the data is sought, such that aggregated 
data can be obtained first, followed by greater refinement of the data. The 
Haar wavelet transform is a favored transform for such purposes, given that 
reconstructed data at a given resolution level is simply a recursively defined 
mean of data values. Vitter and Wang (1999) consider the combinatorial 
aspects of data access using a Haar wavelet transform, and based on a multi- 
way data hypercube. Such data, containing scores or frequencies, is often 
found in the commercial data mining context of OLAP, On-Line Analytical 
Processing. 

As pointed out in Chakrabarti ct al. (2001), one can treat multidimen- 
sional feature hypercubes as a type of high dimensional image, taking the 
given order of feature dimensions as fixed. As an alternative a uniform "shift 
and distribute" randomization can be used (Chakrabarti et al., 2001). 

There arc problems, however, in directly applying a wavelet transform 
to a data table. Essentially, a relational table (to use database terminology; 
or matrix) is treated in the same way as a 2-dimensional pixelated image, 
although the former case is invariant under row and column permutation, 
whereas the latter case is not (Murtagh, Starck and Berry, 2000). Therefore 
there are immediate problems related to non-uniqueness, and data order 
dependence. 

What if, however, one organizes the data such that adjacency has a mean- 
ing? This implies that similarly- valued objects, and/or similarly- valued fea- 
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tures, are close together. This is what we do, using any hierarchical cluster- 
ing algorithm (e.g., the Ward or minimum variance one). 

Without loss of generality, as seen in these figures, we assume that a 
hierarchy is a binary, rooted tree; and equivalently that the series of ag- 
glomerations involve precisely two clusters (possibly singleton clusters) at 
each of the n — 1 agglomerations where there are n observations. These n 
observations are usually represented by n row vectors in our data table. 

A significant advantage in regard to hierarchical clustering is that parti- 
tions of the data can be read off at a succession of levels, and this obviates 
the need for fixing the number of clusters in advance. All possible clustering 
outcomes are considered. (Remark: of course, relative to any one of the 
commonly used cluster homogeneity criteria, each partition is guaranteed to 
be sub-optimal at best.) 

3 The Haar Wavelet Transform of a Dendrogram: 
Summary 

In this article, we will denote the agglomeration of two clusters, q and q', as 
cluster q". So (left or right subtree) nodes in the dendrogram are associated 
with the child (elder or younger) subnodes. We can define the elder cluster 
as q such that z/(g) > i'{q'), but we will not be concerned with whether or 
not elder corresponds to left, and younger to right. 

For n objects or observation vectors, another notation that we can use 
is that the hierarchy H is the set of clusters indexed from 1 to n: H = 
{qi, q2, . . . , qn-i}- We will always assume in this article, for convenience of 
exposition and with little loss of generality, that for distinct clusters ^{q) ^ 

Kg')- 

Whenever the distinction between the following become important, we 
will clearly distinguish between them: clusters; nodes; sets of objects; sets 
of indices of objects; and p-adic number representation of indices of objects. 

The Haar algorithm, as discussed in Paper I, is as follows: 

1. Take each cluster q" in turn, proceeding in sequence through q" G 
{qi,q2,...,qn-i}- 

2. Apply the smoothing function, s: s{q") = ^{q + q'). 

3. Thereby apply the detail function, d: d{q") = s{q") — s{q') = —{s{q") — 

4. Return to step 1 until all n — 1 clusters are processed. 
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For details of how the clusters also take terminal nodes (objects) into 
account, see Paper I. 

Now, it is clear from construction that perfect reconstruction of the 
input data (alternatively expressed, perfect undoing of the foregoing Haar 
algorithm) is guaranteed, given all of the following: (i) all of the detail 
function values, (ii) the final smooth, s{qn-i), (iii) the definition of the 
dendrogram, and (iii) a convention of left and right subtree that allows us 
to traverse down the tree from q" to both q and q'. 

In practice our objectives are to explore the foundations of two distinct 
approaches. Both seek a Haar wavelet basis. These two approaches are 
as follows and can express the 2 input data cases considered in section 4.3 
("The Input Data") of Paper I. 

• Wavelet transform in an ultrametic topology: Induce the Haar ba- 
sis from the hierarchy H that expresses the relationships in a set of 
ultrametrically related points, I. 

• Wavelet transform on embedded subsets: Induce the Haar basis from 
the hierarchy H defining a set of subsets of I. 

In the ultrametric case, each point i e I defines an m-dimensional vector: 
i G W^. For notational convenience therefore i is either the index, or a 
vector. 

In the set of subsets case, each point i G I can be defined as an n- 
dimensional index vector. Thus for example the sequentially second point 
is defined as (0, 1, 0, 0, . . . , 0). 

Both practical cases above can be expressed as follows: we carry out a 
wavelet transform in -L^(G) where G is the group of alternative representa- 
tions of a given hierarchy, H. The points i E I are associated either with 
vectors in R"* or with an orthonormal vector set in Z". (Note how is 
n-dimensional, whereas is m-dimcnsional. The cardinality of / is n. The 
dimensionality of a feature or attribute space is m.) 

4 Wavelet Transform on Discrete Fields 

In this section we look at the wavelet transform on discrete fields, and in 
particular on ^^(Zp). This is realized through cyclic representations of the 
aflBne group of Z or of Zp. 

In wavelets, we are seeking a representation of our data which has "co- 
variant" properties relative to scale: for example, for tranlation "covari- 
ance" , the representation of a shifted signal must be a shifted copy of the 
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representation of the signal (Torresani, 1994). In the group theory per- 
spective, such "covariance" properties (i.e. with respect to the action of a 
symmetry group) are the starting point, and the representation is to be de- 
rived from them. The "covariance" group "turns out to be isomorphic (up 
to a compact factor) to the geometric phase space of the representation" 
(Torresani, 1994, p. 6). 

Traditionally, the wavelet transform is covariant with respect to a group 
action applicable to images, signals, time series, etc., viz. the affine group 
of the real line, which is a continuous group (Antoine et al., 2000a). Thus a 
first task is to bypass the need for a continuous group. 

The group law of the affine group, in generic form ax + b, generates 
translations and dilations. The action of the ax + b group on M means: 
(a, b) : X — > ax + b. We have the following product: 

{b,a)-{b',a') = {b + ab',aa') (1) 

Here, the identity is: (1, 0). The inverse of (a, b) is: (a, b)~^ = {a~^, —-)■ 
This is a non-commutative Lie (and thus continuous) group. 

Flornes et al. (1994) consider a discrete wavelet transform to begin with, 
specified on the Hilbert space ^^(Zp) (where Z are the integers, Zp are in- 
tegers mod p where p is prime for reasons explained below; and i'^ implies 
finite energy from discrete values, or being square integrable). The Haar 
measure is defined on locally compact groups, permitting integration over 
group actions or members; and a locally compact separable group is con- 
sistent with the square integrable property. The group at issue here is the 
cyclic representation of the affine group; or its finite analog, the affine group 
mod p (Foote et al., 2000a). 

A discussion of square integrable group representations in the context of 
time-frequency transforms, including the continuous wavelet transform, can 
be found in Torresani (2000); and Torresani (1994) discusses the counterex- 
ample case of the rotation group, S'^, on the 2-dimensional sphere, which 
gives rise to a representation which is not square integrable. 

The wavelet transform considered by Flornes et al. (1994) has the fol- 
lowing operations: 



Translation : nf{n) = f{n-b) for / G f{Zp) (2) 
Dilation : Daf{n) = f{a~^n) 

(3) 
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The reason why p has to be prime is as follows. Consider the p-adic 
number representation of Zj,. For the p-adic respresentation of to be a 
field, i.e. to have an inverse, p must be prime. 

The unitary representation of a group G is a mapping into a (complex) 
Hilbert space. Flornes et al. (1994) define the following unitary representa- 
tion, vr, on the group, mapping into unitary operations on I'^iTip): 

n{g)fin)=fia-\n-b)) 
In terms of the translation and dilation operators, we have: 

7r(6, a) = TbDa 

Thus far, a purely discrete wavelet transform is at issue. However if we 
take our function values / defined on Zp as sampled values from a continuous 
signal, then problems of interpolation arise. It simply is not good enough 
to transform our discrete data independently of awareness of the underlying 
continuum. Note that this issue is at the nub of where data analysis differs 
from signal processing. In data analysis, mostly a data cloud is taken as 
given (potentially leading to a combinatorial perspective) or as a stochastic 
realization (leading to a statistical modeling). In signal processing, the 
observed data are samples of an underlying topological or other continuous 
structure. It is a prime objective in the signal processing context to keep 
the processing of the observed, sampled data fully and provably consistent 
with the underlying continuous structures. 

A way to address this issue of interpolation is to use B spline filters (the 
simplest example of which is the box function used by the Haar wavelet 
transform) to smooth the data, thereby "filling the holes" between gaps in 
the sampled values, before dilating. This is termed pseudo-dilation. 

Relations ([3]) can be re-expressed as follows (Torresani, 1994): 

Translation : nf{n) = f{n - h) for / € £^(Zp) (4) 
Dilation : Daf{n) = f{a~^n) if a divides n 
= otherwise 

(5) 

Then the affine multiplication law is verified by {T^, Da} (using relation 

TbDaTyDa' = T^+ab'Daa' (6) 
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When / G with p prime, then a always divides n. The afhne 

group on i'^{Z,p) is wcU-dcfincd; and Da define a representation of this 
group; and it turns out that the square integrable property holds. 

In general for / G £^(Z) the a trous algorithm is used incorporating both 
continuously defined dilation, and discretization of the function. 

We could embed each node of a hierarchical clustering, defined as we 
always do so as binary, rooted tree, in Z2. Lang (1998) develops a wavelet 
transform approach (including the Haar wavelet transform and others) for 
such a 2-series local field, or Cantor dyadic group. Taking each cluster q E Q 
or node in the tree individually is not satisfactory from our point of view, 
and so we look further for a more pleasing way to process a hierarchy. 

5 Wavelet Bases on Local Fields 

Wavelet transform analysis is the determining of a "useful" basis for L^(M"*) 
with the following properties: 

• induced from a discrete subgroup of W^, 

• using translations on this subgroup, and 

• dilations of the basis functions. 

Classically (Frazier, 1999; Dcbnath and Mikusihski, 1999; Strang and 
Nguyen, 1996) the wavelet transform avails of a wavelet function ip{x) G 
L^(M), where the latter is the space of all square integrable functions. 
Wavelet transforms are bases on L^(R'"), and the discrete lattice subgroup 

is used to allow discrete groups of dilated translation operators to be 
induced on R*". Discrete lattice subgroups are typical of 2D images (the 
lattice is a pixelated grid) or 3D images (the lattice is the voxelated grid) 
or spectra or time series (the lattice is the set of time steps, or wavelength 
steps). 

Sometimes it is appropriate to consider the construction of wavelet bases 
on L'^{G) where G is some group other than M. In Foote, Mirchandani, 
Rockmore, Healy and Olson (2000a, 2000b; see also Foote, 2005) this is 
done for the group defined by a quadtree, in turn derived from a 2D image. 
To consider the wavelet transform approach not in a Hilbert space but rather 
in locally-defined and discrete spaces we have to change the specification of 
a wavelet function in (M) and instead use (G) . 

Benedetto (2004) and Benedetto and Benedetto (2004) considered in de- 
tail the group G as a locally compact abelian group. Analogous to the integer 
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grid, Z"*, a compact subgroup is used to allow a discrete group of operators 
to be defined on L^(G). The property of locally compact (essentially: finite 
and free of edges) abelian (viz., commutative) groups that is most impor- 
tant is the existence of the Haar measure (Ward, 1994). The Haar measure 
allows integration, and definition of a topology on the algebraic structure of 
the group. 

Benedetto (2004) considers the following cases, among others, of wavelet 
bases constructed via a sub-structure: 

• Wavelet basis on L^(R"^) using translation operators defined on the 
discrete lattice, Z™. This is the situation discussed above, which holds 
for image processing, signal processing, most time series analysis (i.e., 
with equal length time steps), spectral signal processing, and so on. As 
pointed out by Foote (2005) , this framework allows the multiresolution 
analysis in L^(IR™) to be generalized to U'{W^) for Minkowski metric 

other than Euclidean L^. 

• Wavelet basis on L^(Qp), where Qp is the p-adic field, using a discrete 
set of translation operators. This case has been studied by Kozyref, 
2002, 2004; Altaisky, 2004, 2005. See also the interesting overview of 
Khrennikov and Kozyref (2006). 

• Wavelet basis on L^(Qp) using translation operators defined on the 

compact, open subgroup Zp. (It is interesting to note that Z"^ is 
discrete; and that the quotient ]R™/Z™ is compact. In contrast to 
this, Zp is compact; and the quotient Qp/Zp is discrete.) 

• Discussed is a wavelet basis on L^(G), for a group G, using translation 
operators defined on a discrete subgroup, or discrete lattice. 

• Finally the central theme of Benedetto (2004) is a wavelet basis on 
LP'{G) where G is a locally compact abelian group, using translation 
operators defined on a compact open subgroup (or operators that can 
be used as such on a compact open subgroup); and with definition of 
an expansive automorphism replacing the traditional use of dilation. 

A motivation for the work of Benedetto (2004) and Benedetto and Benedetto 
(2004) is laying the groundwork for the wavelet transform on the adelic num- 
bers (see Appendix 3). In this work we are content to be less ambitious in 
regard to number systems - below we focus on a particular p-adic encoding 
of dendrograms; and we are also less ambitious in regard to wavelet functions 
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- staying resolutely with the Haar wavelet in this work. Our motivation is 
due to our application drivers. 

Locally compact abelian groups (LCAG) are a way to take Fourier anal- 
ysis (hence a particularly important class of harmonic analysis because so 
versatile) into more general settings than e.g. the reals (although the reals 
also form a non-compact, but locally compact, abelian group). 

The duals of members of a locally compact abelian group, defined as 
unitary multiplicative characters, x — > ^-2mx ^ form a locally compact 
abelian group (Knapp, 1996). The duality pairing G and G allows for an 
isometry between L'^{G) and L'^{G) (Antoine et al., 2000). 

Fourier analysis is the study of real square integrable functions that are 
invariant under the group of integer translations (see Foote et al., 2000a), 
while abstract harmonic analysis is the study of functions on more general 
topological groups that are invariant under a (closed) subgroup. 

It is interesting to compare some global properties of our approach rela- 
tive to the Fourier transform approach applied to decision trees in Kargupta 
and Park (2004). The Fourier transform lends itself well to a frequency spec- 
trum analysis of binary decision vectors, and the latter can be of importance 
for supervised classification. On the other hand, our work makes use of bi- 
nary trees but in the framework of unsupervised classification. The wavelet 
transform shares with the Fourier transform the property that frequency 
spectral information is determined from the data; and the wavelet trans- 
form additionally determines spatial or resolution scale information from 
the data. We have found the wavelet transform, as described in this article, 
to be appropriate for the type of input data that we have considered. In 
general terms, both we in this work, and Kargupta and Park (2004), have 
as objectives the filtering and compression of data. 

We need affine group action for the wavelet transform, and we have seen 
above (section IU "Wavelet transform on discrete fields" ) that Zp affords us 
this; but for an arbitrary discrete field, and for an arbitrary locally compact 
abelian group, it is tricky to find an affine group. Taking further the Flornes 
et al. (1994) work, Antoine et al. (2000) consider an infinite locally compact 
abelian group, Q; the restriction of ^ to a lattice F C ^; ^, an abelian 
semigroup; and the actions of A on ^^(F). Based on a pseudodilation (i.e., 
the product of a natural dilation by a convolution operator) the case of a 
continuous underlying signal is studied, i.e. the relation between the semi- 
group acting on ^^(F), and a continuous affine group acting on L^(M). Spline 
functions are again among the wavelet functions used (among which is the 
Haar wavelet function associated with the B-spline of order 1). 
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The aspect of greatest interest to us here in the approach of Benedetto 
(2004), and Benedetto and Benedetto (2004), is to define wavelets on L^(G), 
with G taken as the p-adic rationals, Qp, and with the p-adic integers, Zp, 
on which we define translation-hke operators. Firstly, we are using therefore 
L^(Qp), i.e. functions defined on the rationals. Secondly, we use a discrete 
group of operators on L^(Qp) which are not in themselves translation oper- 
ators, but may be used in an analogous way. The "trick" used is that the 
quotient Qp/Zp is discrete, and this will furnish the translation operators. 
An expansive automorphism is also needed in this context, i.e. what we use 
in analogy with dilation. 

A number of alternatives for the subset of Qp/Zp (more strictly the 
quotient of the group dual by the annhilator in the dual of the compact open 
subgroup) are discussed by Benedetto (2000a). Given our application-driven 
interest, wc will not pursue them further here. What we will do, however, is 
to look at how the group-based approach of Benedetto (2000a), that for the 
most part assumes infinite sets, can be tailored for our algorithmic - hence 
finite - purposes. 

We will therefore look at how we can suitably encode any given dendro- 
gram in terms of Qp - or indeed, as will be seen, in terms of Zp. 

Next we will move on to look at how a lattice-proxy is defined on our 
encoding, and thereby translation operators. 

Finally, we will look at how an expansive automorphism can be replaced 
by expansive mapping in the finite and discrete case. 

In all of this, we follow the methodology described by Benedetto (2000a); 
but we restrict all aspects to the finite, discrete context. 

5.1 The Wreath Product Group Corresponding to a Hiercir- 
chical Clustering 

For the group actions, with respect to which we will seek invariance, we con- 
sider independent cyclic shifts of the subnodes of a given node (hence, at each 
level). Equivalently these actions are adjancency preserving permutations 
of subnodes of a given node (i.e., for given q, the permutations of {q',q"}- 
Due to the binary tree, or strictly pairwise agglomerations represented by 
the hierarchy, the "adjacency" property is trivial. We have therefore cyclic 
group actions at each node, where the cyclic group is of order 2. 

The symmetries of H are given by structured permutations of the ter- 
minals. The terminals will be denoted here by Term H. The full group of 
symmetries is summarized by the following algorithm: 
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1. For level I = n — 1 down to 1 do: 

2. Selected node, ly < — node at level I. 

3. And permute subnodes of v. 

Subnode ly is the root of subtree Hi,. We denote Hn-i simply by H. For 
a subnode z^' undergoing a relocation action in step 3, the internal structure 
of subtree H,yi is not altered. 

The algorithm described defines the automorphism group which is a 
wreath product of the symmetric group. Denote the permutation at level v 
by Pi/. Then the automorphism group is given by: 

G = Pfi-i wr Pn-2 wr ... wr P2 wr Pi 

where wr denotes the wreath product. 

Call Term H,^ the terminals that descend from the node at level u. So 
these are the terminals of the subtree H,^ with its root node at level f. We 
can alternatively call Term H^, the cluster associated with level v. 

We will now look at shift invariance under the group action. This 
amounts to the requirement for a constant function defined on Term Hn^v. 
A convenient way to do this is to define such a function on the set Term Hi, 
via the root node alone, u. By definition then we have a constant function 
on the set Term H,^. 

Let us call V^^ a space of functions that are constant on Term Hi,. Possible 
bases of that were considered in Paper I are: 

1. Basis vector with |TermiJi/| components, with values except for value 
1 for component i. 

2. Set (of cardinality n = |TermiJi/|) of m-dimensional observation vec- 
tors. 

The constant function maps 

L(Term/i") — ^ K 

where L is the space of complex valued functions on the set Term H. 

Now wc consider the resolution scheme arising from moving from 
{Termini,/, Termini///} to TermiJi^. From the hierarchical clustering point of 
view it is clear what this represents: simply, an agglomeration of two clusters 
called Term H^,/ and Term il,//', replacing them with a new cluster. Term 
H,. 
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Let the spaces of constant functions corresponding to the two cluster 
agglomerands be denoted V^' and VJ^". These two clusters are disjoint ini- 
tially, which motivates us taking the two spaces as a couple: (Vjy/, V^"). In 
the same way, let the space of constant functions corresponding to node v 
be denoted V^. 

The multircsolution scheme uses a space of zero mean denoted Wi,ii^'i 
with mean defined on the couple of spaces, {Viy/, V^y//): 

In considering spaces of constant functions, V^/ and V,^", we know that 
the support of these spaces are, respectively. Term H'^ and Term iJ^//. So 
if, instead of the space of zero mean denoted W^'u" where mean is defined 
on the couple of spaces, {V^/, 14'/), we considered the mean of the combined 
support, Term i/^U Term Hi,//, then the result would be quite different. We 
would, in fact, have a cluster- weighted mean value. 

5.2 Example 

Let us exemplify a case that satisfies all that has been defined in the context 
of the wreath product invariance that we are targeting. It also exemplifies 
the algorithm discussed in depth in Paper I. Take the constant function on 
V^i to be fi,i. Take the constant function on V^n to be f,yir. Then define 
the constant function on to be (/j^/ -|- fi,")/2. Next define the zero mean 
function on Wuinn to be: 

Wu' = ifu' + /i/")/2 - fu' 
in the support interval ofV^', i.e. Term Hi,/, and 

W„" = ifu' + /i/")/2 - fu" 

in the support interval of T^//, i.e. Term H^,/'. 
Evidently w^,' = —WuH. 

5.3 Inverse Transform 

Following on from the previous subsection, a demonstration that the algo- 
rithm allows for exact reconstruction of the data - the inverse transform - 
is as follows. The constant function on Term H corresponding to the root 
node is fn-i- The two subnodes of the root node, at levels u' and v" , are 
reconstructed from fn-i^Wi,/ (and, as we have seen, we can either use Wi,i or 
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w^ii). We next look at the subtrees whose roots are given by the nodes (just 
considered) at levels u' and u", and these subtrees are necessarily disjoint. 
All subnodes of these currently selected nodes are reconstructed using the 
same algorithm. This procedure is iteratively continued until the terminals 
have been dealt with. 

5.4 Link with Agglomerative Hiercirchical Clustering Algo- 
rithms 

Comparison with traditional clustering criteria is considered next. It is clear 
why agglomerative levels are very problematic if used for choosing a good 
partition in a hierarchical clustering tree: they increase with agglomeration, 
simply because the cluster centers are getting more and more spread out as 
the sequence of agglomerations proceeds. Directly using these agglomerative 
levels has been a way to derive a partition for a very long time. An early 
reference is Mojena (1977). To see how the wavelet transform used by us 
leads to a very different outcome, see Paper I, where we describe use of the 
norms of the Wj^r vectors. 

When we consider agglomerative hierarchical clustering algorithms it is 
clear that (i) clusters are often defined in terms of center of gravity, or 
mean; and (ii) this allows for defining a (vector) difference term between a 
cluster and its immediate sub-clusters. It is also clear that the Haar wavelet 
algorithm is close to the method known as median or Gower's or WPGMC 
- weighted pair group method using centroids: see table, p. 68, of Murtagh 
(1985). 

We could also cater for other agglomerative criteria, subject to storing 
the cluster cardinality values, and develop an algorithm that is close to the 
Haar wavelet one. Constant functions on spaces V^' and T^// remain just as 
before. The zero mean functions on space W,^'i," would now be generalized 
to weight mean (with weights given by cluster cardinalities). Viewed in this 
light, our work has led us to develop a new storage structure for hierarchical 
clustering trees, which is particularly beneficial for data filtering objectives. 

The novelty of our work resides in two areas: (i) we have shown the 
close association between two classes of multiple resolution data analysis 
approaches, agglomerative hierarchical clustering algorithms and the wavelet 
transform; and (ii) our motivation is not at all to construct hierarchical 
clusterings in a new way but rather to illuminate further inherent ultrametric 
properties of data (cf. Murtagh, 2004). 
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6 An Algebraic Representation of a Hierarchy 



6.1 Introduction 

The dendrogram wavelet transform has been seen to be a set of apphcations 
of a function apphed to set members, or cluster members, associated with 
nodes of the hierarchy H. The non-singleton clusters comprise the set Q = 
{qi, <12, ■ ■ ■ ,Qn-i}- The wavelet function is applied in turn to qi, q2, ■ ■ ■■ In 
this paper, we are using the Haar wavelet function. But we could well use 
others (e.g., Altaisky, 2004, uses the Morlet wavelet). 

With each q E Q there is an associated level function, u : q — > M+, 
which induces a total order on Q. We will show that the application of the 
wavelet function to this sequence of clusters is "dilatary" or "expansive" in 
two different ways. 

We will return to what these two different ways are in a moment. The 
hierarchy Hjj=q contains an inccasingiy embedded sequence of subsets (cor- 
responding to increasingly pruning the branches of the tree; Bouchki, 1996). 
We have: i/,/=o D -ffi/i Hy2 D ... D -ffy(n-i)- Define Term as the 
set of terminal nodes of a hierarchy, and Card the cardinality of a set. 
Then Tcrm(i?,y=o) = I with Card(/) = n. Card(Term(i7jyi)) = n — 1. 
Card(Tcrm(i^jy2)) = n — 2. ... Card(Tcrm(//j^(„_]^-))) = 1. Each application 
of the wavelet function is to the minimal (non-singleton) cluster (again see 
Bouchki, 1996) in each y^. Below, we will see how we promote all q G H^^j^ 
to the corresponding q G -f^jy(fc+i) by multiplying clusters (in a particular 
algebraic representation) by 1/p, which is of norm p. 

The "dilatory" or "expansive" character of our sequence of operations, 
viz., application of the wavelet function, comes from (i) the sequence of 
embedded subsets of - so we still operate on a cluster, but the data on 
which we work becomes smaller; or (ii) the sequence of levels at which we 
work is derived from repeatedly taking the product with 1/p of norm p. 

In order to introduce this product with 1/p of norm p we first describe 
the p-adic algebraic representation of the hierarchy. In this representation, 
clusters including singletons, have an operator, denoted ©. This operator 
allows clusters to be defined. Next, we have a null element in Q in this 
algebraic representation, and a norm of each q G Q. Hence for q',q" G Q, 
we can define q' © q". We have: 3g G Q s.t. q = 0. Finally, Vg G Q we have 

Ikll- 
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6.2 H Expressed p-Adically: p-Adic Encoding of a Dendro- 
gram 

We will introduce now the one-to-one mapping of clusters (including single- 
tons) in H into a set of p-adically expressed integers (a fortiori, rationals, 
Qp). The field of p-adic numbers is the most important example of ul- 
trametric spaces. Addition and multiplication of p-adic integers, Zp, are 
well-defined. Inverses exist and no zero-divisors exist. 

A terminal-to-root traversal in a dendrogram or binary rooted tree is 
defined as follows. We use the path x C q C q' C q" C . . . q-n-i, where x is a 
given object specifying a given terminal, and q,q',q", ■ ■ ■ are the embedded 
classes along this path, specifying nodes in the dendrogram. The root node 
is specified by the class qn-i comprising all objects. 

A terminal-to-root traversal is the shortest path between the given ter- 
minal node and the root node, assuming we preclude repeated traversal 
(backtrack) of the same path between any two nodes. 

By means of terminal-to-root traversals, we define the following p-adic 
encoding of terminal nodes, and hence objects, in Figure [TJ 



Xl : 


+ 1 


pi 


+ 1 


• 


+ 1 


• p^ 


+ 1 


P' 


X2 : 


-1 


pi 


+ 1 




+ 1 


■ p^ 


+ 1 


7 

p 


X3 : 




-1 




+ 1 




+ 1 


7 

• p 




X4 : 


+ 1 


p3 


+ 1 


4 

• p 


- 1 


• p^ 


+ 1 


7 

p 


X5 : 


-1 


p3 


+ 1 


4 

• p 


- 1 


■ p^ 


+ 1 


P' 


Xq : 




-1 


4 

• p 


- 1 


• p^ 


+ 1 


7 

• p' 




X7 : 






+ 1 




- 1 


7 

• p 






X8 : 






-1 




- 1 


7 

■ p' 







If we choose p = 2 the resulting decimal equivalents could be identical: 
cf. contributions based on + l-p^ and —1-p^+l-p^. Given that the coefficients 
of the terms (1 < j < 7) are in the set {— 1,0,+1} (implying for xi the 
additional terms: -|-0 • p'^ -|- • + • p^), the coding based on p = 3 is 
required to avoid ambiguity among decimal equivalents. 

A few general remarks on this encoding follow. For the labeled ranked 
binary trees that we are considering, we require the labels -|-1 and —1 for 
the two branches at any node. Of course we could interchange these labels, 
and have these and —1 labels reversed at any node. By doing so we will 
have different p-adic codes for the objects, Xi. 
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The following properties hold: (i) Unique encoding: the decimal codes 
for each Xi (lexicographically ordered) are unique for p > 3; and (ii) Re- 
versibility: the dendrogram can be uniquely reconstructed from any such 
set of unique codes. 

The p-adic encoding defined for any object set above can be expressed 
as follows for any object x associated with a terminal node: 

n-1 

X = ^ Cjp' where Cj £ { — 1, 0, +1} (7) 
i=i 

In greater detail we have: 

n-1 

Xj = ^ Cijp' where Cij G {-1, 0, +1} (8) 
i=i 

Here j is the level or rank (root: n — 1; terminal: 1), and i is an object 
index. 

In our examples we have used: aj = +1 for a left branch (in the sense 
of Figure [1]), = — 1 for a right branch, and = when the node is not on the 
path from that particular terminal to the root. 

A matrix form of this encoding is as follows, where {•}* denotes the 
transpose of the vector. 

Let X be the column vector {xi X2 ■ ■ ■ XnY- 

Let p be the column vector {p^ p^ . . 

Define a characteristic matrix C of the branching codes, +1 and —1, and 
an absent or non-existent branching given by 0, as a set of values Cij where 
i £ I, the indices of the object set; and j G {1,2, ... ,n — 1}, the indices 
of the dendrogram levels or nodes ordered increasingly. For Figure [T] we 
therefore have: 
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For given level j, Vi, the absolute values \cij\ give the membership func- 
tion either by node, j, which is therefore read off columnwise; or by object 
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+1 .1 



in to 



Figure 1: Labeled, ranked dendrogram on 8 terminal nodes, xi,X2, ■ ■ ■ ,Xs- 
Branches are labeled +1 and —1. Clusters are: qi = {xi,X2},q2 = 
{xi,X2,X3},q3 = {x4,X5},q4 = {x4,X5,XG},q5 = {xi, X2, X3, X4^, X5, Xq} , qQ = 
{X7,xs},q7 = {xi,X2,...,X7,X8}. 
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index, i which is therefore read off rowwise. 
The matrix form of the p-adic encoding is: 

X = C7p (10) 

Here, x is the decimal encoding, C is the matrix with dendrogram 
branching codes and p is the vector of powers of a fixed integer (usuahy, 
more restrictively, fixed prime) p. 

The tree encoding exemphfied in Figure [H and defined with coefficients 
in equations ([7]) or ([8]), ([9]) or (fTO]l . with labels +1 and —1 is not commonly 
used: zero and one labels are more common. We required the ±1 labels, 
however, to fully cater for the ranked nodes (i.e. the total order, as opposed 
to a partial order, on the nodes). 

We can consider the objects that we are dealing with to have equivalent 
integer values. To show that, all we must do is work out decimal equivalents 
of the p-adic expressions used above for xi, X2, . . .. As noted in Gouvea 
(2003), we have equivalence between: a p-adic number; a p-adic expansion; 
and an element of Zp (the p-adic integers). The coefficients used to specify 
a p-adic number, Gouvea (2003) notes (p. 69), "must be taken in a set of 
representatives of the class modulo p. The numbers between and p — 1 are 
only the most obvious choice for these representatives. There are situations, 
however, where other choices are expedient." 

6.3 P-adic Dendrogram Addition and Multiplication 

As noted already the wavelet basis on L^(M™) is often induced from the 
discrete subgroup, . Now for a discrete subgroup we use the dendrogram, 
H. The addition operation on the group H will now be explored. 

In order to define a group structure on the p-adic encoded objects, we 
require an addition operation. We do not "carry and add" in the traditional 
way because this does not make sense in this context. Instead we define the 
following "average and threshold" operation for any coefficients (of values 
of p, as used in equations [HI or [TO]l . We define the following compositions 
for such coefficients. 
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Examples from the encoding defined above for xi,X2,... (again with 
reference to Figure [H and equations [71 or [8l [9] or [TO]) follows. 

Xi X2 = +1 ■ p"^ + I ■ + I ■ 
Xi © X3 = +1 • + 1 • p'' 

Xi © X7 = 

X3 © ^6 = +1 • p'^ 

Xs © X8 = 

Informally: in the tree, this addition operation only retains non-zero 
terms for nodes in the tree strictly above the first (i.e. lowest level) cluster 
within which the two objects find themselves. This means that if the two 
objects only find themselves together for the first time in the same cluster 
that contains all objects then the result of the addition operation is 0. 

Let us use our "average and threshold" operation, which we are using as 
a customized addition, to define clusters. We will do so by example, taking 
Figure [1] as our case study. We will call the clusters, ranked by increasing 
node level, qi,q2, ■ ■ ■ as used in the caption of Figure [H 

(/I = Xi © X2 = +1 • + 1 • + 1 • p'' 
q2 = qi ® X3 = +1 ■ p^ + I ■ p"^ 

93 = X4 © X5 = +1 • — 1 • + 1 • p'' 

94 = 93 © X6 = -1 • + 1 • p'^ 

95 = 92 © 94 = +1 -P^ 

q, = xr®xs = -l-p' 
97 = 

The trivial cluster containing all n objects, qn-i, is of value in this 
representation. 

Definition of Null Element: 

On the dendrogram H, the set qn-i = I is the null element when using our 
p-adic encoding (given in definitions ([8]) and (fTOjl ) and addition operation 

m- 

Defining p-adic notation for clusters in this way allows us to define norms 
of clusters; or to define p-adic distances between clusters; or indeed to define 
p-adic distances between clusters and objects (singletons, terminals). We 
will look at these in subsection 16.41 below. 

For completeness we will provide a definition of p-adic dendrogram mul- 
tiplication. Take x = Cjp^ and let y = c'^p^ . The product operation 
is defined on the formal (Laurent) power series as: 
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3 J \ f / n' 
with restriction to the term in p^~^. P-adic dendrogram multiplication will 
be used below in the definition of the expansive operator: this is multipli- 
cation by 1/p. 

6.4 P-adic Distance and Norm on a Dendrogram 

Thus far, we have been concerned with an analytic framework. Now we will 
induce a metric topology on H. 

To find the p-adic distance, we look for the term p*" in the p-adic codes 
of the two objects, where r is the lowest level such that the absolute values 
of the coefficients of p'^ are equal. 

Let us look at the set of p-adic codes for xi,X2, ■ ■ ■ above (Figured]), to 
give some examples of this. 

For xi and X2, we find the term we are looking for to be p^, and so r = 1. 
For xi and X5, we find the term we are looking for to be p^, and so r = 5. 
For X5 and xg, we find the term we are looking for to be p'^, and so r = 7. 

Having found the value r, the distance is defined as p~^'. 
See, inter alia, Benzecri (1979), and Gouvea (2003), for this definition of 
ultrametric distance. 

Examples based on Figure [TJ 

\xi - X2\p = \X2 - Xi\p 
\xi -^4 Ip — 1*^4 

Examples for clusters from Figure [TJ 

\qi - Qslp = - qi\p =P~^- 
\Q2 - gelp = \qe - 92 Ip =p~'^. 

We take for a singleton object r = 0, and so the norm of an object is al- 
ways 1. We therefore define the p-adic norm, of an object corresponding 
to a terminal node in the following way: for any object, x, \x\p = 1. 

The norm of a non-singleton cluster is defined analogously. It is seen to 
be strictly smaller. We have: \q2\p = \qi\p = P~'^- 

For the expansive operator that we use for dilation, we will consider 
product with l/p. The norm associated with this operator is seen to be 
|l/p|p = \p~^\p = P"^''^^ =P- 
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The operator given by multiplication by 1/p therefore has norm or mod- 
ulus p. 

The p-adic norm, or p-adic valuation, satisfies the following properties 
(Schikhof, 1984): 

1. \x\p > 0; \x\p = iff X = 

2. \x + y\p <max{\x\p,\y\p) 

l-^ylp — l-^lplylp 

We also have: \q\p < 1 with equality only if g is a singleton. 

6.5 Modified Dilation Operation: MultipHcation by 1/p 

Consider the set {xi\i G 1} with its p-adic coding considered above. Take 
p = 2. (Non-uniqueness of corresponding decimal codes is not of concern 
to us now, and taking this value for p is without any loss of generality.) 
Multiplication of .ti = +1 • 2^ + 1 • 2^ + 1 • 2^ + 1 • 2^ by 1/p = 1/2 gives: 
-1-1 • 2^ -I- 1 • 2^ + 1 • 2^. Each level has decreased by one, and the lowest level 
has been lost. Subject to the lowest level of the tree being lost, the form 
of the tree remains the same. By carrying out the multiplication-by-l/p 
operation on all objects, it is seen that the effect is to rise in the hierarchy 
by one level. 

Let us call product with 1/p the operator A. The effect of losing the 
bottom level of the dendrogram means that either (i) each cluster (possibly 
singleton) remains the same; or (ii) two clusters are merged. Therefore the 
application of A to all q implies a subset relationship between the set of 
clusters {q} and the result of applying A, {Aq}. 

Repeated application of the operator A gives Aq, A'^q, A^q, Starting 

with any singleton, i £ I, this gives a path from the terminal to the root 
node in the tree. Each such path ends with the null element, as a result 
of the Null Element definition (section 3). Therefore the intersection of the 
paths equals the null element. 

Benedetto and Benedetto (2004) discuss A as an expansive automor- 
phism of /, i.e. form-preserving, and locally expansive. 

Some implications of Benederro and Benedetto's (2004) expansive auto- 
morphism follow. 

For any q, let us take g, Aq, A'^q, ... as a sequence of open subgroups of 
/, with q C Aq C J^q C . . ., and / = \}{q,Aq,J^q, . . .}. This is termed 
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an inductive sequence of 7, and / itself is the inductive limit (Reiter and 
Stegeman, 2000). 

Each path defined by application of the expansive automorphism defines 
a spherically complete system (Schikhof, 1984; Gajic, 2001), which is a 
formalization of well-defined subset embeddedness. 

We now return to our starting point, the Haar algorithm given in section 
1.4. We apply the averaging and difi'erencing operations to each cluster in 
sequence. But now, after doing this for cluster g, we apply the operator A, 
i.e. the 1/p product to the p-adic representation of the dendrogram. This 
causes us to move up a level. This is our enhanced concept of dilation, 
which we apply to the dendrogram, where we keep the same averaging and 
differencing operations applied to the cluster in sequence. 

7 Wavelet Bases from the Wreath Product Group 

In our case we are looking for a new basis for Lp'iG) where G is the set of 
all equivalent representations of a hierarchy, if, on n terminals. Denoting 
the level index of as (so v : H — ^ R"^, where M"*" are the positive 
reals), and = is the level index corresponding to the fine partition of 
singletons, then this hierarchy will also be denoted as Hy=Q. Let / be the 
set of observations. Let the succession of clusters associated with nodes in 
H be denoted Q = {gi, q2, - ■ ■ , Qn-i}- We have n — 1 non-singleton nodes 
in H, associated with the clusters, q. At each node we can interchange left 
and right subnodcs. Hence wc have 2"^^ equivalent representations of H, 
or, again, members in the group, G, that we are considering. 

So we have the group of equivalent dendrogram representations on Hi,=q. 
We have a series of subgroups, Hiy^, D H^^^^.^^ , for < A: < n — 1. Symmetries 
are given by permutations at each level, u, of hierarchy H. Collecting these 
furnishes a group of symmetries on the terminal set of any given (non- 
terminal) node in H. 

The practical application arises through identifying the n terminal nodes 
with (i) m-dimcnsional vectors, or (ii) n-dimensional hypercube vertices. 
On the latter sets of vectors we can also consider an associated permutation 
representation. 

Parenthetically, we note that the permutation representation is known 
as the alternating or zig-zag permutations and are counted by the Andre or 
Euler numbers (Murtagh, 1984a; sequence AOOOlll in Sloane, 2005). 

In this work we ignore another form of equivalent representation, i.e. 
that arising from two or more level values being identical: uj^ = Vk+i for 
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some < A; < n — 1. This means that successive nodes can be interchanged. 
This situation happens when wc have equilateral triangles in the ultrametric 
space, as opposed to triangles that arc strictly isosceles with small base. 

At each non-singleton cluster, q, we define a (trivial) affine group on 
{q', q"). The group is defined on Q = {qi,\v = 1, 2, ... n — 1}. 

Foote et al. (2000a) consider group actions on spherically homogeneous 
rooted trees. The use of the latter quadtree in 2D image processing. 

(An image is recursively decomposed into spatially homogeneous quadrant 
covering regions; and this decomposition is represented as a quadtree. For 
3D image volumes, the data structure becomes an octree.) Just like for us, 
the quadtree nodes can "twiddle" around their offspring nodes but, because 
of the image regions, group action amounts to cyclic shifts or adjacency- 
preserving permutations of the offspring nodes. The relevant group in this 
case is referred to as the wreath product group. 



8 Matrix Interpretation of the Haar Dendgrogram 
Wavelet Transform 

8.1 The Forward Transform 

Consider any hierarchical clustering, H, represented as a binary rooted tree. 
For each cluster q" with offspring nodes q and q' , we define s{q") through 

application of the low-pass filter ^ f 

^(/) = ^(.w+^tf)) = (S;^ )*(:<;>) (13) 

The application of the low-pass filter is carried out in order of increasing 
node number (i.e., from the smallest non-terminal node, through to the root 
node). 

Next for each cluster q" with offspring nodes q and q' ^ we define detail 
coefficients d{q") through application of the band-pass filter ^ f 

=5(%) =(_2:^)'(:<^')) (14) 

Again, increasing order of node number is used for application of this 
filter. See Paper I for further details. 
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8.2 The Ultrametric Case 



We now return to the issue of how we start this scheme, i.e. how we define 
s{i), or the "smooth" of a terminal node. We have distinguished above in 
section [3] between: 

1. H as representing an ultrametric set of relations, 

2. H as representing an embedded set of sets. 

For case 1 we take s{i) as the m-dimensional observation vector corre- 
sponding to i. So, taking all n vectors s{i) we have the initial data matrix 
X of dimensions n x m. 

Then for our set of n points in M"^ given in the form of matrix X we 
have: 

X = CD + Sn-i (15) 

where D is the matrix collecting all wavelet projections or detail coefficients, 
d. The dimensions of C are: n x (n — 1) (see definition ([9])). The dimensions 
of D are (n — 1) x m. 

If Sn~i is the final data smooth, in the limit for very large n a constant- 
valued m-component vector, then let S^-i be the n x m matrix with Sn-i 
repeated on each of the n rows. 

Consider the j'th coordinate of the m-dimensional observation vector 
corresponding to i. For any d{qj) we have: 'Ylik'^ilj)k = Oi the detail 
coefficient vectors are each of zero mean. 

To recapitulate we have: 

X is of dimensions n x m. 

C is of dimensions n x (n — 1). 

D is of dimensions (n — 1) x m. 

Sn-i is of dimensions n x m. 

8.3 The Case of Embedded Set of Sets 

We have distinguished between 

1. H as representing an ultrametric set of relations, 

2. H as representing an embedded set of sets. 
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We now turn attention to the latter. 

In this case we take s{i) as an n-dimensional indicator vector correspond- 
ing to i. So, taking all n vectors s{i) we have the initial data matrix X which 
is none other than the nxn dimensional identity matrix. We will write ^jj^j 
for this identity matrix. 

The wavelet transform in this case is: X^^^^ = CD + Sn-i- 

-^ind °^ dimensions nxn. 

C, exactly as in case 1 (ultrametric case) is of dimensions n x (n — 1). 

D, of necessity different in values from case 1, is of dimensions (n — 1) x n. 
Sn-i, of necessity different in values from case 1, is of dimensions nxn. 

8.4 The Inverse Transform 

In both cases considered (viz., ultrametric, and set of sets) the forward 
and inverse transforms are performed in the same way. The algorithms are 
identical - the inputs alone differ. 

The inverse transform allows exact reconstruction of the input data. We 
begin with Sn-i- If this root node has subnodes q and q', we use d{q) and 
d{q') to form s{q) and s{q'). 

8.5 Wavelet Filtering 

Setting wavelet coefficients to zero and then reconstructing the data is re- 
ferred to as hard thresholding (in wavelet space) and this is also termed 
wavelet smoothing or regression. See the companion paper, Paper I, for 
discussion and examples. 

8.6 Hierarchic Wavelet Transform in Matrix Form 

We will look at the ultrametric case. The matrix generalization of equation 
(fTO]l is: 

X = CP (16) 

Matrix P is formed from the vectors p of equation (jlOp by replicating 
rows. 

Now the wavelet transform gives us: X = CD + Sn^i- Each (replicated) 
row of matrix Sn~i is a particular measure of central tendency. 
Centering X relative to this gives: 

X - Sn-i = CD (17) 
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We conclude from the formal similarity of expressions ()16p and (|17p : the 
initial p-adic encoding of our data vectors has been mapped into a wavelet 
encoding by the wavelet transform. 

With reference to section [71 we note that relation [T7] furnishes a unique 
matrix representation of a dendrogram. 

9 Discussion and Conclusions 

Generalization to regular p-way trees, for p > 2, may also be considered. 
For p = 3 a. natural wavelet function is derived from the triangle scaling 
(Starck et al., 1998) function, which is itself a convolution of a box function 
(the scaling function defining the Haar transform, used in this article) with 
itself. The Haar scaling function used above was (^,5). Convolving this 
with itself gives then the scaling function (5,^,5). Convolving the box 
function again with the triangle function gives the B3 spline scaling function, 
(^'i'i'i'Tl)' "^hich is particularly natural for the analysis of a 5- way, 
p = 5, tree. 

A remark on implementation follows: the 3-way tree is unfolded at each 
node into two 2-way trees. More generally any regular p-way tree is unfolded 
at each node into p—1 two-way branchings. The wavelet transform algorithm 
described previously is then directly applied. 

We now look at other related work. 

In Khrennikov and Kozyrev (2004) and Kozyrev (2001) the Haar wavelet 
transform, defined on binary trees, was also introduced and discussed. Com- 
pared to the notation used here, the descriptions are related though a p-adic 
change of variable (viz., Yl'o' ^iP^ mapped onto Yl'o' ^iP~^^^ ■) 

For the p = 2 case, a convenient notational expression is given by the 
Vladimirov operator (see Avetisov, Bikulov, Kozyrev and Osipov, 2002) 
which is a modified differentiation operator. The Vladimirov operator is a 
p-adically expressed derivative for an ultrametric space with linearly related 
hierarchical levels, z^. In Kozyrev (2001, 2003) it is shown how the eigenval- 
ues of the Vladimirov operator are the Haar wavelets. As a consequence, the 
hierarchical Haar wavelet transform is a spectral analysis of the Vladimorov 
operator. 

Our work differs from the works cited in the following way. Firstly, 
these other works deal with regular p-way trees. Degeneracies are allowed, 
which can cater for the irregular p-way trees that we have considered. We 
have preferred to directly address the dendrogram data structure, given that 
it models observed data well. Secondly, these other works cater for infinite 
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trees. We have restricted ourselves to a more curtailed problem, with the aim 
of having a straightforward implementation, and with the aim of targeting 
the analysis of practical, constructive data analysis problems. 

We have also been more focused in this work compared to the general set- 
ting described in comprehensive depth by Benedetto and Benedetto (2004). 

An important reason for considering dendrograms rather than infinite 
regular trees is that the former setting gives rise to (low order) polynomially 
bound algorithms for all operations; whereas the latter, in the general case, 
are not polynomially bound. 

A final path for future work will be noted. The Haar wavelet transform 
on a dendrogram (H) gives us information on the rate of change of the 
clusters (q), with respect to the level index of each cluster (i/). In a sense 
this Haar wavelet transform is the derivative of H with respect to u. This 
perspective may be of benefit when dealing with the dynamics of ultrametric 
spaces (Avetisov et al., 2002, and references therein; Kuhlmann, 2002). 
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Appendix 1. Haar Wavelet Transform Used in Im- 
age/Signal Processing 

Classically, the Haar wavelet function basis for analysis of Lp'^W^) is deter- 
mined by inducing the basis from an m-dimensional pixel (time step, voxel, 
etc.) grid, Z"*. Basis functions of a space denoted by Vj are defined from a 
scaling function ^ as follows (Starck, Murtagh and Bijaoui, 1998): 

^.,,W = *(2-.-i) ,: = o,...,2'-i wi.h0(x) = { J 

(18) 

The functions (j) are all box functions, defined on the interval [0, 1) and 
are piecewise constant on 2^ sub intervals. We can approximate any function 
in spaces Vj associated with basis functions in a very fine manner for Vq 
(in this case of Vq, all values), more crudely for V^+i and so on. We consider 
the nesting of spaces, . . . V^+i dVjd Vj^i . . . C Vq. Equation (1) directly 
leads to a dyadic analysis. 

Next we consider the orthogonal complement of Vj+i in Vj, and call it 
Wj-^-i. The basis functions for Wj are derived from the Haar wavelet. We 
find 



32 



1 < X < i 



-1 i < X < 1 



tjjj^i{x) = il){2''^x -i) z = 0, . . . , 2-^ - 1 with V'(a;) = < - 

otherwise 

(19) 

This leads to the basis for Vj as being equal to: the basis for V^+i together 
with the basis for Wj j^i. In practice we use this finding like this: we write a 
given function in terms of basis functions in Vj ; then we rewrite in terms of 
basis functions in Vj+i and VFj+i; and then we rewrite the former to yield, 
overall, an expression in terms of basis functions in ^^+2, W j+2 and Wj+i. 
The wavelet parts provide the detail part, and the space provides the 
smooth part. 

For the definitions of scaling function and wavelet function in the case 
of the Haar wavelet transform, proceeding from the given signal, the spaces 
Vj are formed by averaging of pairs of adjacent values, and the spaces Wj 
are formed by differencing of pairs of adjacent values. Proceeding in this 
direction, from the given signal, we see that application of the scaling or 
wavelet functions involves downsampling of the data. The low-pass filter is 
a moving average. The high-pass filter is a moving difference. Other low- 
and high-pass filters are alternatively used to yield other wavelet transforms. 



Appendix 2. Hierarchy, Binary Tree and Ultramet- 
ric Topology 

A hierarchy, H, is defined as a binary, rooted, unlabeled, node-ranked tree, 
also termed a dendrogram (Bcnzccri, 1979; Johnson, 1967; Lcrman, 1981; 
Murtagh, 1985). A hierarchy defines a set of embedded subsets of a given 
set, /. However these subsets are totally ordered by an index function i^, 
which is a stronger condition than the partial order required by the subset 
relation. A bijection exists between a hierarchy and an ultramctric space. 

Let us show these equivalences between embedded subsets, hierarchy, 
and binary tree, through the constructive approach of inducing on a set 

Hierarchical agglomeration on n observation vectors, i ^ I, involves a 
series of 1,2,... ,n — 1 pairwise agglomerations of observations or clusters, 
with the following properties. A hierarchy H = {q\q G 2^} such that (i) 
I e H, (ii) i e H yi, and (iii) for each q e H,q' e H : q n q' ^ $ =^ 
q d q' oi q' C q. Here we have denoted the power set of set / by 2^. An 
indexed hierarchy is the pair {H, u) where the positive function defined on 
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H, i.e., V : H ^ R+, satisfies: u{i) = if z G is a singleton; and (ii) 
q C q' =^ v^q) < i^{q')- Here we have denoted the positive reals, including 
0, by M"*". Function is the agglomeration level. Take q C q', let q C q" and 
q' C g", and let q" be the lowest level cluster for which this is true. Then if 
we define D{q, q') = iy{q"), D is an ultrametric. In practice, we start with a 
Euclidean or other dissimilarity, use some criterion such as minimizing the 
change in variance resulting from the agglomerations, and then define ^(q) 
as the dissimilarity associated with the agglomeration carried out. 

Appendix 3: P-Adic Numbers 

P-adic numbers were introduced by Kurt Hcnscl in 1898. The ultrametric 
topology was introduced by Marc Krasner (1944), the ultrametric inequal- 
ity having been formulated by Hausdorff in 1934. Essential motivation for 
the study of this area is provided by Schikhof (1984) as follows. Real and 
complex fields gave rise to the idea of studying any field K with a complete 
valuation |.| comparable to the absolute value function. Such fields satisfy 
the "strong triangle inequality" |ic + y| < max(lx|, |y|). Given a valued field, 
defining a totally ordered Abelian group, an ultrametric space is induced 
through \x — y \ = d{x,y). Various terms are used interchangeably for anal- 
ysis in and over such fields such as p-adic, ultrametric, non- Archimedean, 
and isosceles. The natural geometric ordering of metric valuations is on the 
real line, whereas in the ultrametric case the natural ordering is a hierarchi- 
cal tree. P-adic numbers, which provide an analytic version of ultrametric 
topologies, have a crucially important property resulting from Ostrowski's 
theorem: Each non-trivial valuation on the field of the rational numbers is 
equivalent either to the absolute value function or to some p-adic valuation 
(Schikhof, 1984, p. 22). Essentially this theorem states that the rationals 
can be expressed in terms of (continuous) reals, or (discrete) p-adic numbers, 
and no other alternative system. 

The p-adic numbers arc base p numbers, where p is a prime number. It 
can be shown that the reals can be expressed as p-adic numbers where p is 
infinity. The question then arises as to whether any one of p = 2, 3, 5, 7, 11, 
. . . , cxD can be preferred. For want of justification to limit attention to one 
or a few values of p, taking them all gives rise to the adelic number system 
(Brekke and Freund, 1993). 



34 



Appendix 4: Some Properties of Ultrametric Spaces 



See elsewhere for the basic ultrametric inequality, and the triangle propert 
- isosceles with small base or equilateral. The following is based on Lerman 
(1981), chapter 0, part IV. 

Theorem 1: Every point of a circle in an ultrametric space is a center of 
the circle. 

Proof 1: it suffices to consider the triangle a,b,x, where a is the center 
of the given circle, b is an element of this circle, and x is an element of 
the circle with the same radius but with center b. This triangle is isosceles. 
From the triangle property the result follows. 

Corollary 1: Two circles of the same radius, that are not disjoint, are 
overlapping. 

Definition 1: A divisor of the ultrametric space, E, is an equivalence 
relation D satisfying Va, 6, x, y, G E : aDb and {d{x,y) < d{a,b)) <^=^ 
xDy. 

Corollary 2: Circles of the same radius form a partition of the ultrametric 
set. The corresponding equivalence is a divisor of the space. 

Definition 2: A valuation of a divisor D of the space E is the number 
= sup^r>yd{x,y). 

Corollary 3: If D and D' are two divisors in E, a finite metric space, 
verifying D < D', then I'iD) < ^{D') and reciprocally. 

Theorem 2: If C adn C' are disjoint circles in E, the distance d{x, y) of 
an X G C and of an y G C depends on C and C only, and not on x and y. 

Proof 2: Consider the triangles x,x,y, where x e C and apply the 
ultrametric triangle relationship. 

Corollary 4: The quotient E/D of an ultrametric space by a divisor is an 
ultrametric space. The distance between two of its points ist strictly greater 
than i'{D) in the finite case. 

Definition 3: An ultrametric proximity is a positive (possibly infinite) 
function p : E x E — > M + + U {+oo}, verifying (i) p{y,x) = p{x,y), (ii) 
p{x, y) = +00 iS X = y; and (iii) p(x, z) > min(p(x, y),p{y, z)). 

Corollary 5: If d is an ultrametric distance, then — log d is an ultrametric 
proximity. If p is an ultrametric proximity, then exp(— p) is an ultrametric 
distance. 

Theorem 3: For an n x n matrix of positive reals, symmetric with respect 
to the principal diagonal, to be a matrix of distances associated with an 
ultrametric distance on E, a sufficient and necessary condition is that a 
permutation of rows and columns satisfies the following form of the matrix: 
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1. Above the diagonal term, equal to 0, the elements of the same row are 
non-decreasing. 

2. For every index k, if 

d{k, k + l) = d{k, k + 2) = ... = d{k, k + i + 1) 

then 

d{k + 1, j) < d{k,j) ioTk + l<j<k + e + l 

and 

d{k + l,j) = d{k,j) ior j > k + i+1 

Under these circumstances, £ > is the length of the section beginning, 
beyond the principal diagonal, the interval of columns of equal terms 
in row k. 

Proof 4: Follows from ultrametric triangle inequality. See Lerman (1981), 
p. 50. 

Theorem 5: In an ultrametric topology, every ball is both open and 
closed (termed clopen). 

(The empty set and the universal set are both clopen. The complement 
of a clopen set is clopen. Finite unions and intersections of clopen sets are 
clopen.) 

From Chakraborty (2004): 

A basic neighborhood of x, of radius r, is the set N{x,r) = {y & X : 
d{x,y) < r}. An open set, C/ C X, is a union of basic neighborhoods, i.e. 
Va; eU,3r = r{x) > s.t. N{x, y) C U. 

Many sets are open and closed at the same time. This property is relative 
to subspaces. Let (X, d) be a metric space and Y d X. If y G F and r > 0, 
let Nyiy, r) denote the basic neighborhood of y in Y, and Nx{y, r) the basic 
neighborhood of y in X. Then Nyiy, r) = Nx{y, r) DY. It follow that a set 
U CY is open in y iff 3 an open set V in X s.t. U = VL}Y. An analogous 
statement holds for closed sets. If U C Y C X then U can be open (or 
closed) in Y without being open (or closed) in X. 

Appendix 5: Ultrametric Spaces are 0-Dimensional 

Informally, a set of points is of necessity 0-dimensional. 
From Chakraborty (2004): 

A base B for the topology T is such that B C T, and every element of 
T is a union of elements from B. 
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A metric space {X, d) is called 0-dimensional if Vx G X, r > 0, a set, 
which is clopen, and x G [/ C N{x,r). 

Van Rooij (1978): a topology is 0-dimensional if it has a base consisting 
of clopen sets. I.e., if for every a € X and for every closed A C X that does 
not contain a, there exists a clopen set U such that a eU,A C X\U. 
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